Conservation implications of elucidating the Korean wolf taxonomic ambiguity through whole‐genome sequencing

Abstract The taxonomic status of the now likely extirpated Korean Peninsula wolf has been extensively debated, with some arguing it represents an independent wolf lineage, Canis coreanus. To investigate the Korean wolf's genetic affiliations and taxonomic status, we sequenced and analysed the genomes of a Korean wolf dated to the beginning of the 20th century, and a captive wolf originally from the Pyongyang Central Zoo. Our results indicated that the Korean wolf bears similar genetic ancestry to other regional East Asian populations, therefore suggesting it is not a distinct taxonomic lineage. We identified regional patterns of wolf population structure and admixture in East Asia with potential conservation consequences in the Korean Peninsula and on a regional scale. We find that the Korean wolf has similar genomic diversity and inbreeding to other East Asian wolves. Finally, we show that, in contrast to the historical sample, the captive wolf is genetically more similar to wolves from the Tibetan Plateau; hence, Korean wolf conservation programmes might not benefit from the inclusion of this specimen.

In 2005, Wozencraft reasserted the subspecies C. l. chanco, including C. coreanus Abe, 1923 as one of its synonyms. Importantly, the C. l. chanco subspecies by itself is a well-known case of taxonomic ambiguity that has been frequently discussed (cf. Aggarwal et al., 2003(cf. Aggarwal et al., , 2007Joshi et al., 2020;Shrotriya et al., 2012). At that time, the C. l. chanco subspecies distribution was considered to cover a large area of central and east Asia, as suggested by Sokolov and Rossolimo (1985). Later, genomic data confirmed a high degree of genetic divergence between Himalayan and Tibetan Plateau wolf populations (Aggarwal et al., 2003), as well as their unique adaptations to high altitude environments (Werhahn et al., 2018). Then, in 2019, the IUCN/SSC Canid Specialist Group recommended using the name C. l. chanco for those populations restricted to the Himalayan range and the Tibetan Plateau (Alvares et al., 2019). Consequently, the remaining Asian wolf populations included before in C. l. chanco were implied to belong to C. l. lupus. Although these taxonomic changes included the Korean wolf population, its validity has yet to be tested using genomic data.
The first time that a Korean wolf was included in a molecular analysis was by Ishiguro et al. (2009Ishiguro et al. ( , 2010, where the mitochondrial control region (751 bp) was used to study the extinct Japanese wolf (C. l. hodophilax) and Ezo wolf from Hokkaido (C. l. hattai). The Korean wolf sample used in that study is described as a C. l. chanco and it was collected from the Hasebe Collection at the University of Tokyo Museum. In the results presented by Ishiguro et al. (2009Ishiguro et al. ( , 2010 the Korean wolf seems to be closely related to Chinese and Mongolian wolves, as would be expected based on geography. However the variation in the mitochondrial control region has limited resolution, hence additional work including autosomal variation is needed to truly evaluate the affiliation of original Korean wolves. Here, we generated whole-genome sequencing data from a captive wolf, putatively North Korean, as well as a historical wolf from South Korea, to an average depth coverage of 25.29× and 7.25× respectively (Table S1). Additionally, we resequenced 10 wolves from poorly characterised populations in Russia, Mongolia and Kazakhstan, and a dead Mongolian captive wolf from Zoo Zürich to an average depth coverage of 5-7×, to obtain a better representation of Asian wolf populations (Table S1). The historical Korean wolf (HKW) genome was obtained from a female specimen housed at the Kyungpook National University Natural History Museum, Gunwi-gun, South Korea ( Figure S9). This individual dates to the beginning of the 20th century (before 1945), and given its provenance, it likely represents one of the last wolf populations in South Korea. The modern genome was obtained from a female captive wolf, originally located in the Pyongyang Central Zoo, North Korea, and subsequently transferred to Seoul Grand Park in 2005. Her wild origin is ultimately uncertain. Using these samples, we aimed to resolve the taxonomic ambiguity of the Korean wolf, searching for the most closely related continental wolf population and analysing its level of genetic differentiation.
Moreover, we endeavour to track the potential origin of the captive wolf to confirm if its lineage came from inside or outside of the Korean Peninsula.

| Description of wolf samples
Three subsamples of the HKW were collected from the skin of a female mounted specimen housed at the Kyungpook National University Natural History Museum, Gunwi-gun, South Korea ( Figure S9). Blood samples were taken from a female captive wolf (PZW) from Seoul Grand Park that was transferred from the were also collected from Russian wolf populations and provided by collaborators. Samples from muscle, cheek tissue, and skin corresponding to MW561, MW574, and MW588 were obtained from contemporary Mongolian wolves.
These data were generated as part of a broader study funded by the Norwegian Environment Agency that aims at looking at the relationship of modern and historic Norwegian wolves to other Eurasian wolves (Stenøien et al., 2021).

| Laboratory processing of modern samples
DNA from modern wolf PZW (SGP-824) was extracted using the DNeasy Blood & Tissue kit (Qiagen), according to the manufacturer's recommendations. DNA quality was assessed by running 1 μL on the Bioanalyzer system (Agilent) to ensure size and analysis of DNA fragments. The concentration of DNA was assessed using the dsDNA BR assay on a Qubit fluorometer (Thermo Fisher Scientific). DNA was converted into double stranded blunt-end libraries with BGI-specific adapters (Mak et al., 2017) using the BEST protocol

| Laboratory processing of historical samples
The HKW sample was processed under strict clean laboratory conditions at the Globe Institute, University of Copenhagen. Three tissue samples were placed into three Eppendorf tubes-1, 2, and 3-and washed with diluted bleach, ethanol and ddH 2 O, following Boessenkool et al. (2017). The material was processed following Gilbert et al. (2007) DNA extraction protocol. Additional treatment with phenol chloroform was performed following Carøe et al. (2018).
The supernatant was then purified using a modified PB buffer and eluted using two washes in 18 μL buffer EB (Qiagen)-with 3 min of incubation time at 37°C (Dabney et al., 2013). The concentration of each extract was checked on a Qubit (ng/μL). BGI libraries were built using 10-20 μL of extract in a final reaction volume of 50 μL following the Santa Cruz Single Stranded protocol (Kapp et al., 2021), and using the "single-tube" library building protocol BEST . Library index amplifications were performed using PfuTurbo Cx HotStart DNA Polymerase (Agilent Technologies) in 50 μL PCR reactions that contained 5 μL of purified library, 0.1 μM of each forward (BGI 2.0) and custom made reverse primers (Mak et al., 2017). The PCR cycling conditions were: initial denaturation at 95°C for 2 min followed by 20 cycles of 95°C for 30 s, 60°C for 30 s, and 72°C for 2 min, and a final elongation step at 72°C for 10 min. Amplified libraries were then purified using 1.8× ratio of MagBio beads to remove adaptor dimers and eluted in 30 μL of EBT (Qiagen) after an incubation for 5 min at 37°C. Amplified libraries were sequenced at Clinomics Inc. with a SE100 mode, and at BGI China with a SR100 mode.

| Data processing
Sequence reads obtained from BGI were mapped to the wolf reference genome  using PALEOMIX v.1.2.13.3 BAM pipeline (Schubert et al., 2014). The wolf reference genome was generated from a highly inbred Swedish wolf, which mitigates issues with reference mapping bias. In brief, adapter trimming was performed using AdapterRemoval v.2.2.0 (Schubert et al., 2016) and only reads with a minimum length of 25 bp were kept. Trimmed reads were mapped to the reference genome using BWA v.0.7.16a backtrack algorithm (Li & Durbin, 2009), disabling the use of a seed parameter. PCR duplicates were identified and removed using Picard MarkDuplicates (Broad Institute, 2023) and, finally, local realignment around indels was performed using GATK v.3.8 3 IndelRealigner module (McKenna et al., 2010).
To evaluate the substitution patterns in the sequences of the HKW DNA, we used mapDamage v.2.0.9 (Jónsson et al., 2013). The aligned sequences resulting from the mapping process were used as input with the default parameters. This step allowed us to authenticate the historical condition of the analysed sample. The HKW DNA sequences showed relatively low levels of fragmentation and deamination patterns typical of ancient DNA damage ( Figure S1).

| Multidimensional scaling plot
We performed single nucleotide polymorphism (SNP) calling by randomly sampling a read for every site for all the samples in our dataset using the option -doHaploCall from ANGSD v.0.931 (Korneliussen et al., 2014). This option allows sampling a random read from each site and each sample instead of performing genotype calling, which allows the incorporation of samples sequenced to lowto medium-depth of coverage. We used the following parameters: doCounts 1 -minMinor 2 -maxMis 7 -C 50 -baq 1 -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -skipTriallelic 1 -doMajorMinor 1. Additionally, bases with quality lower than 20 and mapping quality lower than 30 were discarded. Transitions were removed to avoid aDNA damage that could be found in historical samples. A MAF filter of 0.01 was applied given a final SNPs dataset of 3,284,758 transversion sites. Finally, we restricted the analysis to the scaffolds with at least 1 Mb in size.
Using the previously mentioned SNPs dataset, we estimated pairwise identity-by-state (IBS) genetic distances between the samples, and generated a multidimensional scaling (MDS) analysis using Plink v.1.90 (Chang et al., 2015). A first MDS was estimated including all samples except for the outgroups, and a second MDS was estimated restricting to wolves.

| Admixture analysis
In order to estimate the ancestry components in the HKW and the wolf (PZW) from South Korean Zoo included in our dataset, we used the previously described SNPs panel and ADMIXTURE v.1.3.0 (Alexander et al., 2009). Outgroups were excluded from this analysis. We ran ADMIXTURE assuming 2-10 (K2-K10) ancestry components. Ten replicates for each K value were performed, and the replicate with the best likelihood value was selected. R library Pophelper v.2.3.1 (Francis, 2017) was used to visualise the admixture plots.

| Neighbour-joining tree
A Neighbour-joining tree (NJ) was estimated based on an IBS distance matrix constructed in Plink v.1.90, using the function --distance-matrix. The same SNPs panel described above was used as input to construct the distance matrix. The tree was estimated using R library ape (Paradis & Schliep, 2019) and Interactive Tree Of Life (iTOL) v.4 online tool (Letunic & Bork, 2019) was used to visualise it.

| Maximum likelihood nuclear genome phylogeny
A maximum-likelihood (ML) nuclear genome phylogeny was inferred to explore the evolutionary relationships among the samples from our dataset. The Andean fox and two coyotes were included as outgroups. For each genome, ANGSD v.0.931 was used to generate genomic consensus sequences using the wolf reference genome ("-dofasta2" option). Then, 1000 independent phylogenetic trees were estimated with RAxML-ng v.0.9.0 (Kozlov et al., 2019) under the GTR + G evolutionary model, using 1000 random regions of 5000 bp. All gene trees were concatenated to generate a species tree using ASTRAL-III (Zhang et al., 2018), which was visualised using the Interactive Tree Of Life (iTOL) v.4 online tool (Letunic & Bork, 2019).

| TreeMix
To explore admixture patterns in our HKW, the software TreeMix v.1.13 (Pickrell & Pritchard, 2012) was used to estimate an admixture graph. We included a subset of samples comprising the HKW, the Pyongyang Zoo wolf (PZW), Asian wolves that appeared close to the HKW based on the f 3 -statistics and the estimated NJtree, as well as wolves representing different clades on the tree (Portuguese, Xinjiang-CAN30, Indian-BH6 and Tibet-CAN9A), and Asian dog breeds, excluding other dog breeds to avoid the estimation of migration events among the different dog lineages.
The final subset consisted of 16 wolves, 12 dogs and 2 coyotes used as outgroups. TreeMix was run using the previously created SNPs panel, restricting the analysis to sites without missing data (1,904,538 sites), estimating 0-5 migration events (-m) and grouping SNPs in windows of 500 (-K). Ten replicates were performed for each migration event and the one with the best likelihood was chosen. The final results were plotted on R using TreeMix script plotting_funcs.R.

| Outgroup f 3 -statistics
We calculated outgroup f 3 -statistics using the qp3pop tool from ADMIXTOOLS v.5.1 package (Patterson et al., 2012) to measure the amount of share drift between two populations since their common ancestor. For this analysis we used the coyotes as the outgroup. We estimated the shared drift between all our samples and the two Korean wolves to find their most closely related population. Higher values indicate a closer relationship due to shared drift between the two populations in the test.

| D-statistics
We used our SNPs panel to run D-statistics, as implemented in qpDstat from ADMIXTOOLS v.5.1 (Patterson et al., 2012) to evaluate the possibility of admixture between the historical Korean wolf and dog breeds, or the captive wolf from the Seoul Grand Park . In all the analyses, the Andean fox was used as outgroup. Deviation from 0 was considered statistically significant when the Z-score was below −3 or above 3. The significance of the test was assessed using a weighted block jackknife procedure over 1 Mb blocks.
We then performed genotype calling at the sites of the SNP panel for all samples (N = 71) using BCFtools v.1.12 (Danecek & McCarthy, 2017). Beagle v.5.4 was then used for imputation with default parameters using a sliding window size of 40.0 cM and an overlap of 2 cM between adjacent windows. After applying a MAF filter of 0.05, we obtained a final dataset of 4,036,734 transversion sites. In order to evaluate the accuracy of the estimated imputation, we conducted an MDS analysis, as explained in the MDS plot section, utilising the imputed genotypes. The MDS plot generated from this analysis exhibited the same structure as the one obtained from the SNP dataset prior to the imputation process ( Figure S8A).

| Heterozygosity based on genotype-likelihoods
We used picard-tools to downsample 32 wolf genomes to the same number of mapped reads (~58 million mapped, available for the lowest coverage sample). We estimated genotype likelihoods for each of the samples using ANGSD at the 4,036,734 SNP sites used for estimating heterozygosity with the imputed dataset. Reads with mapping quality below 30, bases with quality below 20 and sites with depth of coverage lower than 3 were not considered.
Genotype-likelihoods were used to estimate the site frequency spectrum (SFS) for each sample independently using ANGSD realSFS (Korneliussen et al., 2014). The per-sample SFS was used as a relative heterozygosity estimate for each sample. We observe a similar pattern to that obtained from the heterozygosity estimates using the imputed dataset ( Figure S8B).

| Heterozygosity per windows and inbreeding coefficient
Imputed genotypes were used to calculate heterozygosity per window using Plink v.2.3.8. Window sizes of 1 Mb were defined on the first 704 scaffolds longer than 1 Mb. The Plink function --sample-counts was implemented to estimate the number of heterozygous sites for each window, and finally, heterozygosity per window ratio was calculated (heterozygous sites/total sites) and the results were visualised using R. To estimate the inbreeding coefficient, we implemented the Plink function --het on the same imputed genotype dataset.

| Korean wolves' population affinities
To explore the population structure in our dataset and assess how the PZW and the HKW relate to wolf and dog populations worldwide, we used a MDS analysis. The results demonstrate that the samples cluster in three main groups: Dogs, highland wolves (repre-  To investigate the phylogenetic relationships of the PZW and HKW, we performed a distance-based NJ tree and the ML phylogeny based on 1000 concatenated trees (Figure 1b; Figure S4). In both approaches, we recover the same clades, although the relationships of the basal branches differ. In the NJ tree, the main clades correspond to North American wolves, Pleistocene wolves, Eurasian wolves, and dogs ( Figure 1b). Wolves are further subdivided into Indian wolves, highland wolves (including PZW), and European and Asian wolves, with HKW placed at the base of the clade with the Liao Ning, Qinghai, and Shanxi wolves (Figure 1b). Overall, the ML phylogeny recovered similar groupings. However, while the North American wolves represent the most basal clade in the NJ tree, the Indian wolves are placed as the most basal clade in the ML phylogeny. These results are consistent with previous studies showing that reconstructing the basal relationships between major wolf clades is challenging due to gene flow between lineages (e.g., Hennelly et al., 2021). In both phylogenetic reconstructions, the PZW is placed close to highland wolves from Qinghai with a high bootstrap support. In contrast, the HKW is placed together with the Liao Ning wolf, the geographically closest population to the Korean Peninsula, and closely related to other East Chinese wolves ( Figure S4).
Altogether, these results suggest that the PZW is not originally from the Korean Peninsula, but instead it is closely related to highland wolves from the Tibetan Plateau. We hypothesise that this specimen was obtained by the Pyongyang Central Zoo via some other route (e.g., exchange with another zoo). Unfortunately, information about this sample is limited. Given the HKW likely represents the original Korean peninsula population and in light of the original questions of this study, for the remaining analyses, we focused mainly on this specimen.

| Dog admixture patterns in the historical Korean wolf
Next, we used outgroup f 3 -statistics to explore more deeply the identified relationships, by evaluating the amount of shared genetic drift between HKW, PZW and other wolf populations. The outgroup f 3 -statistics results for the PZW were in agreement with the clustering and phylogenetic analyses, showing high shared genetic drift with highland wolves from the Tibetan Plateau ( Figure S5). The outgroup f 3 -statistics showed the HKW is most closely related to east Asian wolves, mainly from Central and East China, but also to the highland wolves BH1 and BH4, and the Shar Pei dog (Figure 2a,b). We then used TreeMix (Pickrell & Pritchard, 2012) to look into potential admixture in the HKW suggested by the outgroup f 3 -statistics results. This analysis was performed using a subset of samples, which includes the HKW and the closest Asian wolves as observed in the outgroup f 3 -statistics and NJ tree, as well as wolves representing relevant clades on the tree (Portuguese, Xinjiang-CAN30, Indian-BH6, Tibet-CAN9A and the PZW), and Asian dog breeds. The overall recovered tree topology is in agreement with the NJ and ML trees, and the migration events estimated are in agreement with the admixture patterns observed before (Figure 2c; Figures S6 and S7). When allowing one to three migration events, we find the Tibetan wolves BH1 and BH4 are placed in the same clade as the Indian wolves, but appear to have had gene flow from the HKW lineage and highland wolves, confirming its admixed nature (Figures S6 and S7). The high level of admixture of these two Tibetan wolf samples from Ladakh was already mentioned in the original study where they were described (cf. Hennelly et al., 2021). Finally, the treemix admixture graph shows an admixture event between HKW and the Shar Pei dog, when allowing for an additional migration edge, supporting our outgroup f 3 -statistics results (Figure 2c; Figures S6 and S7).
To formally test for admixture between the HKW and dogs,

| DISCUSS ION
Our results demonstrate that Korean wolves are closely related to East Chinese wolf populations and do not seem to be highly genetically differentiated. This implies that wolf populations from the Korean Peninsula were not geographically isolated from the rest of the continental populations, as was suggested by Abe, who argued that the Yalou River could serve as a geographical barrier (Abe, 1936). Previously published data by Ishiguro et al. (2009Ishiguro et al. ( , 2010 indicated similar results when analysing the mitochondrial control   probably not related to topographic barriers like the Mongolian plateau, the Great Khingan Mountains, the Lesser Khingan Mountains, or the Changbai Mountains, but to ecological affinities. It has been observed before the existence of significant isolation by distance among neighbouring wolf populations (Geffen et al., 2004), which has been associated with ecological variables such as vegetation type, temperature, or even snow cover, among others (Geffen et al., 2004;Pilot et al., 2006). These habitat preferences have been explained in relationship to prey specialisation, due to some of these environmental factors defining the ungulate communities that are the wolves' main prey (Carmichael et al., 2001;Mech & Boitani, 2003;Musiani et al., 2007;Pilot et al., 2006). Similarly, a north-south differentiation pattern among wolf populations has been described before, again, associated with changes in environmental variables (Randi et al., 2001;Stronen et al., 2013). Therefore,  (Ballard et al., 2003;Ciucani et al., 2022;Niemann et al., 2021;Phillips et al., 2003;Randi et al., 2001;vonHoldt et al., 2022). Our data revealed that the HKW is admixed with dogs, with the Shar Pei breed being the most likely source of ancestry contributing to the genetic composition of this specimen. The relatively high estimated heterozygosity levels across the genome of the HKW, as well as its low inbreeding coefficient estimation, could be explained by admixture with dogs. We could expect to find low heterozygosity levels and inbreeding in populations in decline, as can be observed in the case of the Honshu wolf from Japan, which suffered a similar history of decrease in population size and subsequent extinction (Ishiguro et al., 2009;Matsumura et al., 2014;Niemann et al., 2021). Strong admixture signatures are more common in small and isolated populations, which could be the situation for Korean wolves. However, there is a possibility that the admixture pattern found here represents a particular case of a wolf individual with higher or more recent dog admixture than the average in the Korean wolf populations. To truly elucidate the extent and frequency of dog admixture in the wolf populations from the Korean Peninsula, it is necessary to expand the analysis to a larger number of historical specimens from across this region.
In the same way, our results indicate that the Chinese Shar Pei breed is admixed with wolves, mainly from Asia. Taking into consideration that Shar Pei was close to extinction during the 20th century due to its prohibition after the Chinese Communist Revolution (The Editors of Encyclopedia Britannica, 2023), this breed likely went through a population bottleneck, reducing its genetic diversity and favouring admixture with wolves. According to historical records about the Shar Pei dog, the breed survived in Taiwan and Hong Kong, making it unlikely that the main wolf source of admixture was wolves from Korea. Therefore, we consider that the high affinity between the HKW and the Shar Pei dog observed in the D-statistics is due to the Shar Pei ancestry in the Koren wolf rather than the other way around.
Our findings have the potential to contribute to wolf conservation strategies, both at a local level for the reintroduction of wolves in the Korean Peninsula and at a regional level. In the context of the

DATA AVA I L A B I L I T Y S TAT E M E N T
Generated raw sequence reads are deposited to the European Nucleotide Archive (Project ID: PRJEB57591).

B EN EFIT-S H A R I N G S TATEM ENT
The research work presented here was the result of an international collaboration that involved the participation and co-authorship of scientists from the different countries providing the wolf samples.
The results produced by this collaboration can directly impact on the local and regional conservation strategies of wolf populations.